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Amendments to tbcClaims 

This listing of claims will replace all prior versions, and listings of claims in the 
application: 

Listing of Claims: 

1 1. (original): A system for analyzing unstructured documents for 

2 conceptual relationships, comprising: 

3 a histogram module determining a frequency of occurrences of concepts in 

4 a set of unstructured documents, each concept representing an element occurring 

5 in one or more of the unstructured documents; 

6 a selection module selecting a subset of concepts out of the frequency of 

7 occurrences, grouping one or more concepts from the concepts subset, and 

8 assigning weights to one or more clusters of concepts for each group of concepts; 

9 and 

10 a best fit module calculating a best fit approximation for each document 

11 indexed by each such group of concepts between the frequency of occurrences 

12 and the weighted cluster for each such concept grouped into the group of 

13 concepts. 

1 2, (original): A system according to Claim 1, further comprising: 

2 an extraction module extracting features from each of the unstructured 

3 documents and normalising the extracted features into the concepts. 

1 3. (original): A system according to Claim 2, further comprising; 

2 a structured database storing the extracted features as uniquely identified 

3 records. 

1 4. (original); A system according to Claim 1, further comprising: 

2 a visualization module visualizing the frequency of occurrences, 

3 comprising at least one of creating a histogram mapping the frequency of 

4 occurrences for each document in the unstructured documents set and creating a 
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coipus graph mapping the frequency of occurrence for all such documents in the 
unstructured documents set. 

5. (original): A system according to Claim 1, further comprising: 

a threshold comprising a median and edge conditions, each such concept 

in the concepts subset occurring within the edge conditions. 

i 

6. (original): A system according to Claim 1, further comprising: 

an inner product module determining, for each group of concepts, the best 
fit approximation as the inner product betwlen the frequency of occurrences and 

i 

the weighted cluster for each such concept ih the group of concepts. 

7. (original): A system according to Claim 6, wherein the inner 



product dduxer is calculated according to the 



cluster 



■ cluster,. 



equation comprising: 



where docconapt represents the frequency of (occurrence for a given concept in the 
document and cluster *<xmeepi represents the wiight for a given cluster. 

8. (original); A system according to Claim 1, further comprising: 
a control module iteratively re-detednining the best fit approximation 

responsive to a change in the set of unstructured documents. 

i 

9. (original): A method for analyzing unstructured documents for 
conceptual relationships, comprising: 

determining a frequency of occurrences of concepts in a set of 
unstructured documents, each concept representing an element occurring in one or 
more of the unstructured documents; 

selecting a subset of concepts out of the frequency of occurrences; 

grouping one or more concepts from the concepts subset; 

assigning weights to one or more clusters of concepts for each group of 
concepts; and 
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10 calculating a best fit approximation for each document indexed by each 

1 1 such group of concepts between the frequency of occurrences and the weighted 

12 cluster for each such concept grouped into the group of concepts. 

1 10. (original): A method according to Claim 9 5 further comprising: 

2 extracting features from each of the unstructured documents; and 

3 normalizing the extracted features into the concepts* 

1 1 1. A method according to Claim 10, further comprising: 

2 storing the extracted features as uniquely identified records in a structured 

3 database. 

1 12. (original): A method according to Claim 9, further comprising: 

2 visualizing the frequency of occurrences, comprising at least one oft 

3 creating a histogram mapping the frequency of occurrences for 

4 each document in the unstructured documents set; and 

5 creating a corpus graph mapping the frequency of occurrence for 

6 all such documents in the unstructured documents set. 

1 13. (original): A method according to Claim 9, further comprising: 

2 defining a threshold comprising a median and edge condi tions, each such 

3 concept in the concepts subset occurring within the edge conditions. 

1 14. (original): A method according to Claim 9, further comprising: 

2 for each group of concepts, determining the best fit approximation as the 

3 inner product between the frequency of occurrences and the weighted cluster for 

4 each such concept in the group of concepts. 

1 15. (original): A method according to Claim 14, wherein the inner 

2 product defter is calculated according to the equation comprising: 

3 rf cw = X doc^ ■ cluster 
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4 where doc c ^uxpt represents the frequency of occurrence for a given concept in the 

5 document and cluster CMC ept represents the weight for a given cluster. 

1 16* (original): A method according to Claim 9, further comprising: 

2 iteratively re-determining the best fit approximation responsive to a 

3 change in the set of unstructured documents. 

1 17* (original); A computer-readable storage medium holding code for 

2 performing the method according to Claims 9, 10, 11, 12, 13, 14, 15, or 16* 

1 18. (original): A system for dynamically evaluating latent concepts in 

2 unstructured documents* comprising; 

3 an extraction module extracting a multiplicity of concepts from a set of 
. 4 unstructured documents into a lexicon uniquely identifying each concept and a 

5 frequency of occurrence; 

6 a frequency mapping module creating a frequency of occurrence 

7 representation for each documents set, the representation providing an ordered 

8 corpus of the frequencies of occurrence of each concept; 

9 a concept selection module selecting a subset of concepts from the 

10 frequency of occurrence representation filtered against a minimal set of concepts 

11 each referenced in at least two documents with no document in the corpus being 

12 unreferenced; 

13 a group generation module generating a group of weighted clusters of 

14 concepts selected from the concepts subset; and 

15 a best fit module determining a matrix of best fit approximations for each 

16 document weighted against each group of weighted clusters of concepts . 

1 19- (original): A system according to Claim 18, further comprising: 

2 a histogram module creating a histogram mapping the frequency of 

3 occurrence representation for each document in the documents set. 

1 20. (original): A system according to Claim 19, further comprising; 
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2 a data mining module mining the multiplicity of concepts from each 

3 document as at least one of a noun, noun phrase and tri-gram. 

1 21. (original): A system according to Claim 19, further comprising: 

2 a normalizing module normalizing the multiplicity of concepts into a 

3 substantially uniform lexicon. 

1 22. (original): A system according to Claim 21, wherein the 

2 substantially uniform lexicon is in third normal form. 

1 23. (original): A system according to Claim 18, further comprising: 

2 a corpus mapping module creating a corpus graph mapping the frequency 

3 of occurrence representation for all documents in the documents set, 

1 24. (original): A system according to Claim 18, further comprising: 

2 a threshold module defining the pre-defined threshold as a median value 

3 and a set of edge conditions and choosing those concepts falling within the edge 

4 conditions as the concepts subset. 

1 25. (original): A system according to Claim 18, further comprising: 

2 a cluster module naming one or more of the concepts within the concepts 

3 subset to a cluster and assigning a weight to each concept with each such cluster. 

1 26. (original): A system according to Claim 25, further comprising: 

2 a group module grouping one or more of the clusters into each such group 

3 of weighted clusters of concepts. 

1 27. (original): A system according to Claim 18, further comprising: 

2 a Euclidean module calculating a Euclidean distance between the 

3 frequency of occurrence for each document and a corresponding weighted cluster, 

1 28, (original): A system according to Claim 18, further comprising: 
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2 a iteration module removing select documents from the documents set and 

3 iteratively reevaluating the matrix of best fit approximations based on a revised 

4 frequency of occurrence representation and concepts subset. 

1 29, (original): A system according to Claim 18, further comprising: 

2 a structured database storing the lexicon, the lexicon comprising a 

3 plurality of records each uniquely identifying one such concept and an associated 

4 frequency of occurrence. 

1 30- (original); A system according to Claim 29, wherein the structured 

2 database is an SQL database. 

1 31. (original): A method for dynamically evaluating latent concepts in 

2 unstructured documents, comprising: 

3 extracting a multiplicity of concepts from a set of unstructured documents 

4 iiito a lexicon uniquely identifying each concept and a frequency of occurrence; 

5 creating a frequency of occurrence representation for each documents set, 

6 the representation providing an ordered corpus of the frequencies of occurrence of 

7 each concept; 

8 selecting a subset of concepts from the frequency of occurrence 

9 representation filtered against a minimal set of concepts each referenced in at least 

10 two documents with no document in the corpus being unreferenced; 

1 1 generating a group of weighted clusters of concepts selected from the 

12 concepts subset; and 

13 determining a matrix of best fit approximations for each document 

14 weighted against each group of weighted clusters of concepts, 

1 32. (original): A method according to Claim 31, further comprising: 

2 creating a histogram mapping the frequency of occurrence representation 

3 for each document in the documents set* 

1 33. (original): A method according to Claim 32, further comprising: 
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2 mining the multiplicity of concepts from each document as at least one of 

3 a noun, noun phrase and tri-gram. 

1 34. (currently amended): A method according to Claim [[33]] 1L 

2 further comprising: 

3 normalizing the multiplicity of concepts into a substantially uniform 

4 lexicon. 

1 35. (original): A method according to Claim 34, wherein the 

2 substantially uniform lexicon is in third normal form. 

1 36. (original): A method according to Claim 31, further comprising: 

2 creating a corpus graph mapping the frequency of occurrence 

3 representation for all documents in the documents set 

1 37. (original): A method according to Claim 31, further comprising: 

2 defining the pre-defined threshold as a median value and a set of edge 

3 conditions; and 

4 choosing those concepts falling within the edge conditions as the concepts 

5 subset* 

1 38. (original): A method according to Claim 31 , further comprising: 

2 naming one or more of the concepts within the concepts subset to a 

3 cluster; and 

4 assigning a weight to each concept with each such cluster. 

1 39. (original): A method according to Claim 38, further comprising: 

2 grouping one or more of the clusters into each such group of weighted 

3 clusters of concepts, 

1 40. (original): A method according to Claim 31 ? further comprising: 

• 2 calculating a Euclidean distance between the frequency of occurrence for 

3 each document and a corresponding weighted cluster. 
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1 (original): A method according to Claim 31, further comprising: 

2 removing select documents from the documents set; and 

3 iteratively reevaluating the matrix of best fit approximations based on a 

4 revised frequency of occurrence representation and concepts subset. 

1 42. (original): A method according to Claim 31, further comprising: 

2 storing the lexicon in a structured database, the lexicon comprising a 

3 plurality of records each uniquely identifying one such concept and an associated 

4 frequency of occurrence. 

1 43* (original): A method according to Claim 42, wherein the structured 

2 database is an SQL database. 

1 44. (original): A computer-readable storage medium hol ding code for 

2 performing the method according to Claims 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 

3 or 42. 

1 
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